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ABSTRACT 
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lowered validity of t^sts scored with empirical option weights might 
be explained by a capitalization of the keying procedures on omitting 
tendencies, a procedure was devised to key options empirically with a 
"correction-for-guessing" constraint* Use of the new procedure with 
Graduate Record Examinations data resulted in smaller increases in 
reliability than those observed when unconstrained procedures were 
used, but validities for quantitative subforms were not appreciably 
lowered* Validities for verbal subforms were lowered slightly, 
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Abstract 

Because previous reports have suggested that the lowered validity of 
tests scored with empirical option weights might be explained by a capital- 
ization of the keying procedures on omitting tendencies, a procedure was 
devised to key options empirically with a "correction-for~guessing ,f constraint 
Use of the new procedure with Graduate Record Examinations (GRE) data resulted 
in smaller increases in reliability than those observed when unconstrained 
procedures were used, but validities for quantitative subforms were not appre- 
ciably lowered. Validities for verbal subforms were lowered slightly, however 



Empirical Option Weighting with a Correction for Guessing 

Richard R. Reilly 
Educational Testing Service 

Two recent reports (Hendrickson, 1971; Reilly & Jackson, 1972) have 
suggested that weighting options empirically results in substantial increases 
in reliability and test homogeneity, but at the expense of lowered test 
validity. These findings are at variance with those reported in an earlier 
study by Davis and Fifer (1959) who found similar increases in reliability 
and plight increases in validity when options were weighted empirically. All 
three studies employed modifications of a weighting technique originally known 
as The Method of Reciprocal Averages (Mosier, l$k6) which, in effect, maximizes 
the product -moment correlatior etween item scores and criterion scores by 
assigning to each item-option values proportional to the mean criterion score 
for all individuals choc ?ing that option. 

A key difference between the Davis and Fifer study and the first two 
mentioned was that testj in the first two were administered with formula 
score instructions while Davis and Fifer instructed examinees to attempt 
every item. Thus, Hendricks on and Reilly and Jackson had an additional 
"option," that of omit. K» ndrickson, reporting on the weights generally 
assigned to the omi^: category comments, "...An interesting finding of this 
study was that the weight of 'omit' was almost always lower than any of the 
other distracters in an item..." (Hendricks on, 1971). Reilly and Jackson 
(1972) take this x step further and suggest that, "...the empirical keying 
procedures described capitalize on the tendency to omit and... while this 
tendency is reliable, it is not valid." 
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Because of these suggestions, it was decided to devise and test a 
procedure which weighted options subject to the constraint that the weight 
for omit equal the mean weight for the options. The rationale is similar 
to that used in the usual formula scoring method in that it assumes that an 
individual omitting an item should receive the expected weight under con- 
ditions of random response to that item. 

In order to determine the optimum weights for a single item, subject to 
the "correction- for-guessing"constraint , the following objective function was 
set up; 

F = Z£(y. . - w.) 2 - 2A[(k - l)w - Z6.w.] , 
ji 13 ' P j J J 

where 

y. . denotes the criterion score of the ith 
13 — 

individual making the jth response; 

w. is the weight for the jth response, 

J 

j = 1. . . ,p, .... k ; and 

w is the weight for the omit category. 
P 

6. = one for j / p , and ;;ero otherwise; and 

A is the LaGrange multiplier. 

Taking partial derivatives and solving for the weights which minimize 
the function we find that the solution, which requires a small (k - 1 x k - l) 
matrix inversion, has the following properties (see appendix): (l) The mean 
item score over all individuals is equal to the mean criterion score; (2) the 
weights arrived at are proportional to the weights which will maximize the 
correlation between the item and the criterion subject to the constraint of a 
fixed item variance (and, of course, the constraint that t.ie omit weight equals 
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the mean of the option weights); (5) unlike the unconstrained option weights, 
the weights arrived at will not, in general, yield the maximum possible product 
moment correlation; (K) for unconstrained weights it has been pointed out 
(Stanley & Wang, 1970) that a slope of 1.0 and a zero intercept will describe 
the regression of the criterion scores on the item scores. The appropriate 
slope for the regression of criterion scores on item scores yielded by the 
new method will not, in general, be 1.0, nor will the appropriate intercept, 
in ge,iera. , be zero. 

Procedure 

Two parallel forms each, of the verbal (denoted as and V 0 ) and 

quantitative ( and ) sections of the GRE, were devised by assigning 
one-half of the items on each section to each of the two special parallel 
forins. Forms and consisted of 50 items each, while forms and 

consisted o" 27 items each. It should be noted that the two forms in 
each set, since they were constructed frc;n operational tests, were not adminis- 
tered under separate time limits. Because of practical limitations the more 
desirable, procedure of administering the tv/o parallel forms "under separately 
timed conditions v/as not possible. 

Data were the same as these used in the Reilly and Jackson (1972) study. 
A spaced sample (i.e., a sample consisting of every nth answer sheet) of 
^,000 answer sheets (sample A) from the December 1970 administration of the 
GRE was employed for study purposes. A second sample (sample B) consisting 
of the arr>wer sheets of 4,9l£ individuals from J ;he same administration was 
taken for validation purposes. Sample A was divided into two randomized block 
groups of ?,500 (samples A and A ) by blocking on total GRE score. The 



.5,000 answer sheets were ordered in terms of the verbal score plus the 

quantitative score and then randomly assigned to the tv/o subsamples. This 

increased the likelihood that the tv/o split s'amples would be comparaole in 

terms of total score distributions. Each subtest was keyed against the 

scores on its parallel form in sample A 1 . The tests in sample Ag were 

then scored using these derived weights and intercorrelations, and alpha 

coefficients v/ere computed. Thus, all results reported are those obtained 

with cross -validated weights. 

The next step involved scoring the sample B answer sheets and computing 

the single order and multiple correlations between the empirically keyed 

tests and undergraduate GPA. Sample B was drawn from a total of hO different 

colleges. Wi thin-school samples ranged from a low of l6 to a high of 399* 

A modification of one of Tucker's (1965) central prediction methods was us^d 

2 

to pool-, data across colleges. 

Results and Discussion 

The results of the keying on parallel forms reliability and internal 
consistency are presented in Tables 1 and 2. The proportional increases in 

insert Tables 1 and 2 about here 

effective test lengths are comparable to those reported by Hendrickson (1971) 
but less than those observed by Reilly and Jackson (1972). The smaller 
increments observed for the quantitative tests are consistent with previous 
findings, and may, as Hendrickson (J97l) suggests, be related to the common 
observation that differences in the quality of the distracters are less 
apparent for general mathematical items than for verbal items. 



Keilly and J ache on (19Y2) observed increases in the correlations 
beuv en verbal and quantitative bests when empirical weights were, used and 
attributed these increases to the capitalization of the keying procedure 
on an omitting factor common to both tests. Thus, the. results shown in 
Table 5 are of interest since they indicate that when constrained weights 

Insert Table 3 abou ; , here 

are used the large increases in verbal-quantitative correlations do not occur. 
When increases in reliability are taken into account the increases are 
actually slightly less than expected in two of the four cases shov/n and 
slightly greater than expected in the remaining tv/o casee. 

In Table K, the correlations are shown between pairs of parallel subtests, 

"insert Table k about here 

soured with empirical weights and the other v/ith formula weights. These 
i * ' '«>iT«:l"ii i nre, in general, slightly higher than the parallel forms 
i ' '.-tJlity, in contrast to the uniformly lov/er values obtained when uncon- 
yi,':.An*;d weights //ere used (Reilly & Jackson, 19Y?). 

Tlie validity results are presented in Table £ > While the zero-o^der 

'insert Table S about here 

validities lb}* the quantitative forms are almost unchanged, the multiple 
correlations are slightly lower overall owing primarily to the decreases 
in the correlations between GPA and the empirically keyed verbal subtests. 
J i is difficult to cyp.nin why, even with the mclified keying procedure, the 
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verbal test validities were lowered. Apparently, the empirically keyed 
verbal tests are measuring some additional factors which, though reliable, 
may not be valid. 

Conclusions 

While the results reported here certainly do not indicate that steps 
should be taken to implement empirical option weighting, the findings are 
not entirely discouraging either* It has been shown that a test can be made 
more reliable and more homogeneous through option weighting and, at least for 
the, quantitative forms, without any appreciable lowering of validity . 

Further research should be done on several key issues which have emerged 
in this study. First, the issue of omitting behavior should be looked at 
more closely. Green (1972) has presented data for the SAT which indicate 
rhat "or it" scores are even more reliable than rights -only or formula scores. 
It may be that an omitting score can be used as a suppressor variable along 
v/ith the formula score to increase tha correlation with the criterion. 

Another interesting and potentially useful study would be one which 
examined the effects of keying options directly on the GPA criterion. 
Examination of t^e, v/eights for options may reveal .cons."s~ent patterns which 
could be helpful in guiding item writers. 
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Footnotes 

"^The research reported herein was supported by the Graduate Record 

Examinations Board. 

p 

The method used is a least -squares procedure worked out by Robert F- 
Boldt and is more fully described in a report by Briggs (1970). 
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Table 1 

Cross- Validated Parallel Forms Reliabilit5.es for 
Empirically Keyed and Formula Scored Subtests 



Formula Empirically Keye^ K 

Verbal .8909 .92^2 I.U9 

Quantitative .871*2 .8892 l.l6 

^ gives the estimated proportional increase in test length 

which would bo necessary to yield the increased R 1 s shown. 
Rearranging the Spearman- Brown prophecy formula, 

K = R„(l - R J 



w' 



where R_ is the R obtained with formula score weights and 
r 

R^ is the cross-validated R obtained with empirical weights. 
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Cross-Vali dated Internax consistency Coefficients for 
Formula Scored and Empirically Keyed Tests 





Formula 


Empirically Keyed 




v l 


.8TU5 


.9069 


1.U0 




.8755 


.908U 


1M 


\ 


.8515 


.8817 


1.30 


Q 2 


.8725 


.8852 


1.13 



\ gives the estimated proportional iner^se in test length 
which would be necessary to yield the increased a f s shown. 
Rearranging the Spearman-Brown prophecy formula, 



a (l - a.J 

where a„ is the a obtained with formula score we i Jits and 
F 

a is the cross-validated a obtained with empirical weights, 
w 
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Table 3 

Intercorrelations between Verbal and Quantitative Forms 
for Formula Scored and Empirically Keyed Tests 





Formula 


Empirically Keyed 


Expected 8, 


V 1 Q 1 




.U5T7 


.1*269 


V 2 Q 1 


.1+190 


.1*1*28 


.'♦550 


V 1 Q 2 


.U0T9 


.1*301* 


.1*191 


V 2 Q 2 


.Uo6l 


.1*133 


.1*173 



The expected values represent the expected correlation which 
should have resulted from the increased reliability of the empirical 
key scores. These values were obtained by multiplying the true 
formula score correlations between V and Q by the geometric mean 
of the empirical key score reliabilities. Parallel forms reliabilities 
were used in all cases. 
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Table k 

Intercorrelations between Enpirioally Keyed 
and Formula Scored Parallel Forms 

Parallel Forms Empirically Keyed , 

Reliability vs. Formula Scored Parallel Form' 



I II 

Verbal -8909 -8953 .89H* 

Quantitative . 8Ti+2 . 8726 . 88U8 



a Column 1 shows the correlation between form V ]L ( Q 1 ) empirically 
keyed and form ( Q g ) formula scored. Column 2 shows the correlation 

between V ( Q 0 ) empirically keyed and V ( Q, ) formula scored. 
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Tabie 5 

a 

Validity Coefficients for Selected Pairs of Empirically 
Weighted and Formula Scored Subtests 





v i 




v 1+ v 2 


V 2 


Q 2 


7 +0 

2 '2 


Formula Scores 


.3167 


.1909 


.318U 


.2939 


.205U 


.3013 


Unconstrained 
Weight s b 


.2703 


.1661* 


.2666 


.2532 


.150H 


.2550 


Constrained 
Weights 


.2998 


.189U 


.2997 


.2828 


.2055 


.2919 



'Single order coefficients were estimated as follows: 




multiple correlation coefficients were obtained using a pooling procedure 
described by Briggs (1970 ). 

The unconstrained weights were those obtained by keying against 
parallel forms (Reilly & Jackson, 1972). 



APPENDIX 

First we solve for weights which minimize the least squares criterion 
subject to the constraint that the weight for omit equals the mean of the 
option weights. Let 

F = ZK(y id - w.) 2 - 2A[(k - l)w p - 25.v J 

be the function to be minimized subject to the restriction that the weight 
for one of the categories, w , equals the mean of the remaining (k - 1) 
weights, v/.iere 

w. is the weight for the jth category; , 
V is the criterion score for the ith 
person in the jth category; 

5. is one if j /p , zero if j = p ; 

J 

A is the LaGrange multiplier; 

k is the total number of categories) 

i is 1, ; and 

j is 1, k . 

Take the partial derivative with respect to w , 

4r- = 2%. . - 2n.w. + 2* . 

J 

Take the partial derivative with respect bo w , 

ir 

$L = 2 Zy. - 2n w - 2(k - l)A . 
dvT .'lp p p 

Setting both equations equal to zero and multiplying by (- ~), we have 



r 



T 



1 
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2y. . - n.w. - A = 0 



(i) 



and 



Zy. - n w + (k - l)A = 0 . ( 2 ) 



i IP P P 



Taking (l) and summing over o > 

Z8 (Zy. . - n.w. - A) = 0 
• oVio 0 0 

Rearranging, 

Z6 n.w. = Z6.Zy. . - (k - l)A . (5) 
. 0 0 0 ^ 0 ± xo 

Rearranging equation (2) similarly and adding to equation (3) we have 

Zn.w. = ZZy. . M 

o JJ oi 1J 

or, 

w = y , 

a desiraole result since the mean of the scores generated with the new 
weights will always equal the criterion mean. Rearranging (2) we obtain 

A = F^T (n P w P " f ip } ' 

Substituting this last result in equation (l) we have 

By the constraint, however, 

E5.w . 
4 J J 

D K - 1 
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so that 



S5.w. 
, 3 3 



j j /'ij k-rpk-i ."lp 

and 



n.w . + 



Thus 



* * (k - l) o J J i 1J K " 1 i 1P 
, we have k - 1 such equations and k - 1 unknown weights (the weight 



E&.w. 
i J J 

w is fixed at * 7 ; . 

p k - 1 



Let 

n 



JB— a q 



(k - l)< 

and construct the (k - l) x (k - l) matrix X with diagonal elements 
( n + q ) f j n i,„.p - 1, p + l,...k , and off-diagonal elements q . 

Let W be a column vector of k - 1 weights, w , j = 1, . . .p - 1> P + Jo 

J 

and let Y be a (k - l) x 1 vector with elements 

Zy. . + ^y ip > A = 1^ • • -P " P + ^ • • - k • 

The equations can be represented in matrix form as follows: 

XW « Y 
and the solution 

W = X" 1 ! 
is readily obtained. 



1— r 
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Next we prove that the weights derived in the foregoing proof are 
proportional to the weights which will maximize the correlation between 
an item and a criterion subject to both the formula score constraint and 
the. constraint of some fixed variance, B . Let the objective function 
to be maximized be 



H = En .w .y . - \ \ (2n .w 2 - NB) 
j 3 3 3 2 1 j j 



+ A f&i.w.) - AjZS.w. - {k - l>w ) , 
2% 3 3 3 J 3 P 



where 



y « y • . y • • 

j j 

and where the LaGran^ multipliers represent the following constraint con- 
ditions : 

(A.^) the variance of the weights when taken over all individuals 
in the sample is equal to some constant B ; 
) the mean item score will be zero; and 

(A^) the pth category weight is the average of the other 
k - 1 weights. 

Taking the partial derivative with respect to any w (j p) and setting 

3 

the result equal to zero we have 

n.y. - A ii. w. + An. - A_. = 0 . (8) 
3 3 1 3 3 2 j 

Taking the partial derivative with respect to w and setting the result 

ir 

equal to zero we obtain 

n y - A n w + An + (k - l)A = 0 . (9) 
p^p 1 p p 2 p '3 



-■V 
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Sumrcing equations over j we obtain 



But, by constraint, 



and by definition, 



2n.y. = 0 

j J J 



Thus, 



A 2 = 0 . 



Solving for in equation (9) and substituting the result in equation 



we have 



V'svAv-iVrh'p • (10) 



Since by constraint, however, 
£8. w. 



p k - 1 

n n 

\(n.w. + ^—5 2?,w ) = n y + r-^r y . 

1 3 J (k - l) j J J ,] J " P 

Let the X matrix and the W vt,?tor be defined as in the previous 
proof, and let Y be a vector with elements 

Thus, 
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\jXW = Y 



and 



hK 1 



W = X ... 

We see that the solution is identical to that obtained previously 
except for a proportionality constant. To find the proportionality constant 
"K. } let G be the vector of k - 1 weights, with elements g. where 

G = X -1 Y . 

By the constraint 

Zn.w 2 = NB 



but since 



A" 2 2n.g 2 = NB 
1 , 3 3 

d 



and 




A. = V NB 



